# Video Description Generation
Tarsier 7b
Tarsier-7b is an open-source large-scale video-language model from the Tarsier series, specializing in generating high-quality video descriptions with excellent general video understanding capabilities.
Video-to-Text
Transformers

T
omni-research
635
23
Video Blip Flan T5 Xl Ego4d
MIT
VideoBLIP is an enhanced version of BLIP-2 capable of processing video data, using Flan T5-xl as the backbone language model.
Video-to-Text
Transformers English

V
kpyu
40
3
Featured Recommended AI Models